While machine learning models have achieved unprecedented success in real-world applications, they might make biased/unfair decisions for specific demographic groups and hence result in discriminative outcomes. Although research efforts have been devoted to measuring and mitigating bias, they mainly study bias from the result-oriented perspective while neglecting the bias encoded in the decision-making procedure. This results in their inability to capture procedure-oriented bias, which therefore limits the ability to have a fully debiasing method. Fortunately, with the rapid development of explainable machine learning, explanations for predictions are now available to gain insights into the procedure. In this work, we bridge the gap between fairness and explainability by presenting a novel perspective of procedure-oriented fairness based on explanations. We identify the procedure-based bias by measuring the gap of explanation quality between different groups with Ratio-based and Value-based Explanation Fairness. The new metrics further motivate us to design an optimization objective to mitigate the procedure-based bias where we observe that it will also mitigate bias from the prediction. Based on our designed optimization objective, we propose a Comprehensive Fairness Algorithm (CFA), which simultaneously fulfills multiple objectives - improving traditional fairness, satisfying explanation fairness, and maintaining the utility performance. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed CFA and highlight the importance of considering fairness from the explainability perspective. Our code is publicly available at https://github.com/YuyingZhao/FairExplanations-CFA .
translated by 谷歌翻译
图形神经网络(GNN)已通过隐式捕获协作效应的消息通知成功地采用了推荐系统。然而,大多数现有的推荐消息机制是直接从GNN继承的,而无需仔细检查捕获的协作效果是否会受益于用户偏好的预测。在本文中,我们首先分析了消息传播如何捕获协作效应,并提出了面向建议的拓扑指标,共同的相互作用比率(CIR),该比例(CIR)衡量了节点的特定邻居与其其余邻居之间的相互作用水平。在证明了利用邻居与高级CIR合作的好处之后,我们提出了一项推荐销售的GNN,协作意识图形卷积网络(CAGCN),它超出了1-Weisfeiler-Lehman(1-WL)测试,以区分非优质 - 图形图形。六个基准数据集的实验表明,最佳CAGCN变体的表现优于最具代表性的基于GNN的建议模型LightGCN,在Recess@20中的近10%,并且达到了80 \%的加速。我们的代码可在https://github.com/yuwvandy/cagcn上公开获取。
translated by 谷歌翻译
图形神经网络(GNN)表现出令人满意的各种图分析问题的性能。因此,在各种决策方案中,它们已成为\ emph {de exto}解决方案。但是,GNN可以针对某些人口亚组产生偏差的结果。最近的一些作品在经验上表明,输入网络的偏见结构是GNN的重要来源。然而,没有系统仔细检查输入网络结构的哪一部分会导致对任何给定节点的偏见预测。对输入网络的结构如何影响GNN结果的偏见的透明度很大,在很大程度上限制了在各种决策方案中的安全采用GNN。在本文中,我们研究了GNN中偏见的结构解释的新研究问题。具体而言,我们提出了一个新颖的事后解释框架,以识别可以最大程度地解释出偏见的两个边缘集,并最大程度地促进任何给定节点的GNN预测的公平水平。这种解释不仅提供了对GNN预测的偏见/公平性的全面理解,而且在建立有效但公平的GNN模型方面具有实际意义。对现实世界数据集的广泛实验验证了拟议框架在为GNN偏见提供有效的结构解释方面的有效性。可以在https://github.com/yushundong/referee上找到开源代码。
translated by 谷歌翻译
图神经网络(GNN)在图形上学习节点表示方面表现出很大的力量。但是,他们可能会从训练数据中继承历史偏见,从而导致预测的歧视性偏见。尽管某些工作已经开发出公平的GNN,但其中大多数直接从非图形域借用了公平代表性学习技术,而没有考虑GNN中特征传播引起的敏感属性泄漏的潜在问题。但是,我们从经验上观察到,特征传播可能会改变以前无害特征与敏感特征的相关性。这可以看作是敏感信息的泄漏,可以进一步加剧预测中的歧视。因此,我们根据特征相关性设计了两个特征掩盖策略,以突出考虑特征传播和相关性变化在减轻歧视中的重要性。通过我们的分析,我们提出了公平视图图神经网络(FAIRVGNN),以通过自动识别和掩盖敏感的相关特征来生成特征的公平视图,以考虑特征传播后的相关变化。鉴于博学的公平视图,我们适应编码器的夹紧权重,以避免使用敏感相关的功能。现实世界数据集的实验表明,Fairvgnn在模型实用程序和公平性之间取得了更好的权衡。我们的代码可在https://github.com/yuwvandy/fairvgnn上公开获取。
translated by 谷歌翻译
图形神经网络(GNNS)在学习图表表示中取得了前所未有的成功,以识别图形的分类标签。然而,GNN的大多数现有图形分类问题遵循平衡数据拆分协议,这与许多真实情景中的许多实际方案都有比其他类别更少的标签。在这种不平衡情况下直接培训GNN可能导致少数群体类别中的图形的无色表达,并损害下游分类的整体性能,这意味着开发有效GNN处理不平衡图分类的重要性。现有方法是针对非图形结构数据量身定制的,或专为不平衡节点分类而设计,而少数关注不平衡图分类。为此,我们介绍了一个新颖的框架,图形图形 - 图形神经网络(G $ ^ 2 $ GNN),通过从邻近图和本地从图形本身来源地通过全局导出额外的监督来减轻图形不平衡问题。在全球范围内,我们基于内核相似性构建图表(GOG)的图表,并执行GOG传播以聚合相邻图形表示,其最初通过通过GNN编码器汇集的节点级传播而获得。在本地,我们通过掩模节点或丢弃边缘采用拓扑增强,以改善辨别说明书测试图的拓扑结构中的模型概括性。在七个基准数据集中进行的广泛图形分类实验证明了我们提出的G $ ^ $ ^ 2 $ GNN优于F1-Macro和F1-Micro Scores的大约5 \%的大量基线。 G $ ^ 2 $ GNN的实现可用于\ href {https://github.com/yuwvandy/g2gnn} {https://github.com/yuwvandy/g2gnn}。
translated by 谷歌翻译
Autonomous vehicles are being deployed with a spectrum of capability, extending from driver assistance features for the highway in personal vehicles (SAE Level 2+) to fully autonomous fleet ride sharing services operating in complex city environments (SAE Level 4+). This spectrum of autonomy often operates in different physical environments with different degrees of assumed driver in-the-loop oversight and hence have very different system and subsystem requirements. At the heart of SAE Level 2 to 5 systems is localization and mapping, which ranges from road determination for feature geofencing or high-level routing, through lane determination for advanced driver assistance, to where-in-lane positioning for full vehicle control. We assess localization and mapping requirements for different levels of autonomy and supported features. This work provides a framework for system decomposition, including the level of redundancy needed to achieve the target level of safety. We examine several representative autonomous and assistance features and make recommendations on positioning requirements as well map georeferencing and information integrity.
translated by 谷歌翻译
We present Azimuth, an open-source and easy-to-use tool to perform error analysis for text classification. Compared to other stages of the ML development cycle, such as model training and hyper-parameter tuning, the process and tooling for the error analysis stage are less mature. However, this stage is critical for the development of reliable and trustworthy AI systems. To make error analysis more systematic, we propose an approach comprising dataset analysis and model quality assessment, which Azimuth facilitates. We aim to help AI practitioners discover and address areas where the model does not generalize by leveraging and integrating a range of ML techniques, such as saliency maps, similarity, uncertainty, and behavioral analyses, all in one tool. Our code and documentation are available at github.com/servicenow/azimuth.
translated by 谷歌翻译
Science tests competing theories or models by evaluating the similarity of their predictions against observational experience. Thus, how we measure similarity fundamentally determines what we learn. In machine learning and scientific modeling, similarity metrics are used as objective functions. A classic example being mean squared error, which is the optimal measure of similarity when errors are normally distributed and independent and identically distributed (iid). In many cases, however, the error distribution is neither normal nor iid, so it is left to the scientist to determine an appropriate objective. Here, we review how information theory can guide that selection, then demonstrate the approach with a simple hydrologic model.
translated by 谷歌翻译
As Artificial and Robotic Systems are increasingly deployed and relied upon for real-world applications, it is important that they exhibit the ability to continually learn and adapt in dynamically-changing environments, becoming Lifelong Learning Machines. Continual/lifelong learning (LL) involves minimizing catastrophic forgetting of old tasks while maximizing a model's capability to learn new tasks. This paper addresses the challenging lifelong reinforcement learning (L2RL) setting. Pushing the state-of-the-art forward in L2RL and making L2RL useful for practical applications requires more than developing individual L2RL algorithms; it requires making progress at the systems-level, especially research into the non-trivial problem of how to integrate multiple L2RL algorithms into a common framework. In this paper, we introduce the Lifelong Reinforcement Learning Components Framework (L2RLCF), which standardizes L2RL systems and assimilates different continual learning components (each addressing different aspects of the lifelong learning problem) into a unified system. As an instantiation of L2RLCF, we develop a standard API allowing easy integration of novel lifelong learning components. We describe a case study that demonstrates how multiple independently-developed LL components can be integrated into a single realized system. We also introduce an evaluation environment in order to measure the effect of combining various system components. Our evaluation environment employs different LL scenarios (sequences of tasks) consisting of Starcraft-2 minigames and allows for the fair, comprehensive, and quantitative comparison of different combinations of components within a challenging common evaluation environment.
translated by 谷歌翻译
Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods -- including $k$-means and hierarchical agglomerative clustering -- underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often nearing ceiling), and the experimental methodology seemingly favors the deep methods. We conduct a large-scale empirical study of 17 clustering methods across three datasets and obtain several robust findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they match or even perform worse than shallow, heuristic-based methods. When embeddings are highly discriminative, deep methods do outperform the baselines, consistent with past results, but the margin between methods is much smaller than previously reported. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved. To enable reproducibility, we include all necessary details in the appendices, and plan to release the code.
translated by 谷歌翻译